Exploratory Factor Analysis – Exercises

This set of exercises is about exploratory factor analysis. We shall use some basic features of psych package. For quick introduction to exploratory factor analysis and psych package, we recommend this short “how to” guide.

You can download the dataset here. The data is fictitious.

Answers to the exercises are available here.

If you have different solution, feel free to post it.

Exercise 1

Load the data, install the packages psych and GPArotation which we will use in the following exercises, and load it. Describe the data.

####################
#                  #
#    Exercise 1    #
#                  #
####################

install.packages(c("psych", "GPArotation"))
library(psych)
data <- read.file("efa.csv")
describe(data)

##     vars   n mean   sd median trimmed  mad min max range  skew kurtosis
## V1     1 649 4.79 0.47      5    4.89 0.00   3   5     2 -2.16     3.96
## V2     2 649 4.77 0.53      5    4.89 0.00   2   5     3 -2.63     7.59
## V3     3 649 4.62 0.72      5    4.79 0.00   1   5     4 -2.13     4.73
## V4     4 649 4.84 0.45      5    4.96 0.00   2   5     3 -3.13    10.87
## V5     5 649 4.85 0.44      5    4.97 0.00   2   5     3 -3.31    12.40
## V6     6 649 4.83 0.48      5    4.95 0.00   2   5     3 -3.31    12.40
## V7     7 649 4.71 0.61      5    4.85 0.00   2   5     3 -2.20     4.59
## V8     8 649 4.70 0.62      5    4.85 0.00   2   5     3 -2.22     4.70
## V9     9 649 4.50 0.85      5    4.69 0.00   1   5     4 -1.97     3.86
## V10   10 649 4.69 0.72      5    4.86 0.00   1   5     4 -2.81     8.78
## V11   11 649 4.61 0.89      5    4.86 0.00   1   5     4 -2.65     6.63
## V12   12 649 4.39 1.12      5    4.67 0.00   1   5     4 -1.86     2.39
## V13   13 649 4.68 0.62      5    4.80 0.00   1   5     4 -2.27     5.96
## V14   14 649 4.37 0.80      5    4.50 0.00   1   5     4 -1.17     0.93
## V15   15 649 4.61 0.62      5    4.71 0.00   2   5     3 -1.60     2.54
## V16   16 649 4.71 0.60      5    4.85 0.00   1   5     4 -2.23     5.09
## V17   17 649 4.71 0.62      5    4.85 0.00   1   5     4 -2.42     6.32
## V18   18 649 4.72 0.60      5    4.85 0.00   1   5     4 -2.50     7.29
## V19   19 649 4.56 0.73      5    4.72 0.00   1   5     4 -1.70     2.68
## V20   20 649 4.42 0.90      5    4.60 0.00   1   5     4 -1.73     2.80
## V21   21 649 3.30 0.96      3    3.29 1.48   1   5     4 -0.16    -1.03
## V22   22 649 2.98 1.10      3    2.89 1.48   1   5     4  0.41    -0.98
## V23   23 649 3.27 1.23      3    3.29 1.48   1   5     4 -0.10    -1.11
## V24   24 649 2.69 1.01      3    2.68 1.48   1   5     4  0.33    -0.64
## V25   25 649 2.85 1.07      3    2.85 1.48   1   5     4  0.14    -0.92
## V26   26 649 3.69 0.83      4    3.75 0.00   1   5     4 -0.74     0.53
##       se
## V1  0.02
## V2  0.02
## V3  0.03
## V4  0.02
## V5  0.02
## V6  0.02
## V7  0.02
## V8  0.02
## V9  0.03
## V10 0.03
## V11 0.03
## V12 0.04
## V13 0.02
## V14 0.03
## V15 0.02
## V16 0.02
## V17 0.02
## V18 0.02
## V19 0.03
## V20 0.04
## V21 0.04
## V22 0.04
## V23 0.05
## V24 0.04
## V25 0.04
## V26 0.03

Exercise 2

Using the parallel analysis, determine the number of factors.

####################
#                  #
#    Exercise 2    #
#                  #
####################

fa.parallel(data)

## Parallel analysis suggests that the number of factors =  5  and the number of components =  3

Exercise 3

Determine the number of factors using Very Simple Structure method.

####################
#                  #
#    Exercise 3    #
#                  #
####################

vss(data)

## 
## Very Simple Structure
## Call: vss(x = data)
## VSS complexity 1 achieves a maximimum of 0.91  with  2  factors
## VSS complexity 2 achieves a maximimum of 0.93  with  3  factors
## 
## The Velicer MAP achieves a minimum of 0.01  with  3  factors 
## BIC achieves a minimum of  -742.22  with  5  factors
## Sample Size adjusted BIC achieves a minimum of  -160.17  with  8  factors
## 
## Statistics by number of factors 
##   vss1 vss2   map dof chisq     prob sqresid  fit RMSEA  BIC SABIC complex
## 1 0.88 0.00 0.014 299  2003 1.9e-249    14.6 0.88 0.095   67  1016     1.0
## 2 0.91 0.91 0.014 274  1546 3.3e-176    10.5 0.91 0.086 -228   642     1.0
## 3 0.62 0.93 0.013 250  1095 4.9e-106     9.0 0.93 0.073 -524   270     1.4
## 4 0.56 0.86 0.015 227   783  2.9e-62     8.1 0.93 0.062 -687    34     1.7
## 5 0.40 0.81 0.019 205   585  2.8e-38     7.3 0.94 0.054 -742   -91     1.9
## 6 0.40 0.81 0.022 184   502  2.4e-31     6.4 0.95 0.053 -689  -105     2.0
## 7 0.38 0.82 0.024 164   425  4.8e-25     5.7 0.95 0.050 -637  -116     2.1
## 8 0.35 0.75 0.027 145   318  5.0e-15     5.3 0.96 0.044 -621  -160     2.3
##   eChisq  SRMR eCRMS  eBIC
## 1   2201 0.072 0.075   265
## 2   1093 0.051 0.055  -682
## 3    665 0.040 0.045  -953
## 4    461 0.033 0.040 -1009
## 5    320 0.028 0.035 -1007
## 6    237 0.024 0.031  -955
## 7    162 0.020 0.028  -900
## 8    115 0.017 0.025  -824

Exercise 4

Based on normality test, is the Maximum Likelihood factoring method proper, or is OLS/Minres better? (Tip: Maximum Likelihood method requires normal distribution.)

####################
#                  #
#    Exercise 4    #
#                  #
####################

sapply(data, shapiro.test)

##           V1                            V2                           
## statistic 0.4880158                     0.4818245                    
## p.value   1.790874e-39                  1.218162e-39                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V3                            V4                           
## statistic 0.5837358                     0.4039763                    
## p.value   1.193739e-36                  1.287584e-41                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V5                            V6                           
## statistic 0.386337                      0.4010764                    
## p.value   4.917495e-42                  1.097388e-41                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V7                            V8                           
## statistic 0.5352881                     0.5343438                    
## p.value   3.876222e-38                  3.636428e-38                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V9                            V10                          
## statistic 0.631653                      0.4956688                    
## p.value   4.943564e-35                  2.898898e-39                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V11                           V12                          
## statistic 0.4952979                     0.6048789                    
## p.value   2.83163e-39                   5.900295e-36                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V13                           V14                          
## statistic 0.5630808                     0.7476871                    
## p.value   2.666746e-37                  2.854301e-30                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V15                           V16                          
## statistic 0.6405669                     0.5328645                    
## p.value   1.031309e-34                  3.290914e-38                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V17                           V18                          
## statistic 0.5253764                     0.5207042                    
## p.value   1.993172e-38                  1.462449e-38                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V19                           V20                          
## statistic 0.6448004                     0.6737761                    
## p.value   1.469911e-34                  1.828638e-33                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V21                           V22                          
## statistic 0.8599259                     0.8613114                    
## p.value   1.369292e-23                  1.744914e-23                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V23                           V24                          
## statistic 0.8992078                     0.8899322                    
## p.value   3.080095e-20                  4.157814e-21                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           V25                           V26                          
## statistic 0.8948388                     0.8292828                    
## p.value   1.17986e-20                   9.748275e-26                 
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"

Exercise 5

Using oblimin rotation, 5 factors and factoring method from the previous exercise, find the factor solution. Print loadings table with cut off at 0.3.

####################
#                  #
#    Exercise 5    #
#                  #
####################

f.solution <- fa(data, nfactors=5, rotate="oblimin", fm="minres")
print(f.solution$loadings, cutoff=0.3)

## 
## Loadings:
##     MR1    MR3    MR5    MR2    MR4   
## V1          0.662                     
## V2          0.675                     
## V3          0.740                     
## V4          0.664                     
## V5          0.460                     
## V6   0.448  0.336                     
## V7   0.722                            
## V8   0.648                            
## V9   0.652                            
## V10  0.759                            
## V11  0.546                            
## V12  0.344         0.465              
## V13                0.483              
## V14                0.563              
## V15                0.566              
## V16                0.621              
## V17                0.676              
## V18                0.495              
## V19                              0.866
## V20                              0.524
## V21                       0.316       
## V22                                   
## V23                       0.820       
## V24                       0.616       
## V25                                   
## V26                       0.448       
## 
##                  MR1   MR3   MR5   MR2   MR4
## SS loadings    2.887 2.452 2.381 1.522 1.283
## Proportion Var 0.111 0.094 0.092 0.059 0.049
## Cumulative Var 0.111 0.205 0.297 0.355 0.405

Exercise 6

Plot factors loadings.

####################
#                  #
#    Exercise 6    #
#                  #
####################

plot(f.solution, title="Factor loadings")

Exercise 7

Plot structure diagram.

####################
#                  #
#    Exercise 7    #
#                  #
####################

fa.diagram(f.solution, main="Structural diagram")

Exercise 8

Find the higher-order factor model with five factors plus general factor.

####################
#                  #
#    Exercise 8    #
#                  #
####################

omega(data, nfactors = 5, sl=FALSE, title="Higher-order factor solution")

## Higher-order factor solution 
## Call: omega(m = data, nfactors = 5, title = "Higher-order factor solution", 
##     sl = FALSE)
## Alpha:                 0.91 
## G.6:                   0.94 
## Omega Hierarchical:    0.81 
## Omega H asymptotic:    0.86 
## Omega Total            0.94 
## 
## Schmid Leiman Factor loadings greater than  0.2 
##          g   F1*   F2*   F3*   F4*   F5*   h2   u2   p2
## V1    0.62        0.45                   0.58 0.42 0.65
## V2    0.62        0.46                   0.59 0.41 0.65
## V3    0.61        0.50                   0.63 0.37 0.58
## V4    0.62        0.45                   0.60 0.40 0.63
## V5    0.62        0.31                   0.50 0.50 0.77
## V6    0.70        0.23  0.23             0.62 0.38 0.80
## V7    0.72              0.37             0.65 0.35 0.79
## V8    0.74              0.33             0.66 0.34 0.82
## V9    0.76              0.33             0.69 0.31 0.83
## V10   0.77              0.38             0.76 0.24 0.79
## V11   0.59              0.28        0.22 0.46 0.54 0.74
## V12   0.64  0.27                         0.52 0.48 0.79
## V13   0.75  0.28                         0.69 0.31 0.82
## V14   0.65  0.33                         0.57 0.43 0.73
## V15   0.62  0.33                         0.50 0.50 0.76
## V16   0.67  0.36                         0.60 0.40 0.75
## V17   0.68  0.39                         0.63 0.37 0.72
## V18   0.60  0.29                         0.48 0.52 0.75
## V19   0.57                          0.69 0.79 0.21 0.41
## V20   0.60                          0.41 0.57 0.43 0.63
## V21                           0.32       0.15 0.85 0.01
## V22                           0.30       0.10 0.90 0.00
## V23-                         -0.82       0.68 0.32 0.00
## V24-                         -0.62       0.38 0.62 0.00
## V25                           0.23       0.08 0.92 0.00
## V26-                         -0.45       0.20 0.80 0.00
## 
## With eigenvalues of:
##    g  F1*  F2*  F3*  F4*  F5* 
## 8.68 0.81 1.12 0.74 1.52 0.80 
## 
## general/max  5.71   max/min =   2.06
## mean percent general =  0.55    with sd =  0.32 and cv of  0.58 
## Explained Common Variance of the general factor =  0.64 
## 
## The degrees of freedom are 205  and the fit is  0.92 
## The number of observations was  649  with Chi Square =  585.24  with prob <  2.8e-38
## The root mean square of the residuals is  0.03 
## The df corrected root mean square of the residuals is  0.03
## RMSEA index =  0.054  and the 10 % confidence intervals are  0.048 0.059
## BIC =  -742.22
## 
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 299  and the fit is  3.3 
## The number of observations was  649  with Chi Square =  2103.59  with prob <  3.5e-268
## The root mean square of the residuals is  0.09 
## The df corrected root mean square of the residuals is  0.09 
## 
## RMSEA index =  0.097  and the 10 % confidence intervals are  0.093 0.1
## BIC =  167.43 
## 
## Measures of factor score adequacy             
##                                                  g   F1*  F2*   F3*  F4*
## Correlation of scores with factors            0.92  0.68 0.77  0.64 0.88
## Multiple R square of scores with factors      0.84  0.46 0.59  0.41 0.77
## Minimum correlation of factor score estimates 0.67 -0.08 0.19 -0.19 0.55
##                                                F5*
## Correlation of scores with factors            0.82
## Multiple R square of scores with factors      0.68
## Minimum correlation of factor score estimates 0.35
## 
##  Total, General and Subset omega for each subset
##                                                  g  F1*  F2*  F3*  F4*
## Omega total for total scores and subscales    0.94 0.88 0.87 0.88 0.21
## Omega general for total scores and subscales  0.81 0.71 0.62 0.72 0.00
## Omega group for total scores and subscales    0.07 0.17 0.25 0.16 0.21
##                                                F5*
## Omega total for total scores and subscales    0.78
## Omega general for total scores and subscales  0.42
## Omega group for total scores and subscales    0.37

Exercise 9

Find the bifactor solution.

####################
#                  #
#    Exercise 9    #
#                  #
####################

omega(data, title="Bifactor solution")

## Bifactor solution 
## Call: omega(m = data, title = "Bifactor solution")
## Alpha:                 0.91 
## G.6:                   0.94 
## Omega Hierarchical:    0.86 
## Omega H asymptotic:    0.92 
## Omega Total            0.94 
## 
## Schmid Leiman Factor loadings greater than  0.2 
##          g   F1*   F2*   F3*   h2   u2   p2
## V1    0.59        0.44       0.54 0.46 0.64
## V2    0.58        0.48       0.56 0.44 0.59
## V3    0.54        0.56       0.63 0.37 0.47
## V4    0.56        0.54       0.60 0.40 0.52
## V5    0.61        0.37       0.50 0.50 0.73
## V6    0.68        0.35       0.59 0.41 0.79
## V7    0.72        0.21       0.56 0.44 0.91
## V8    0.74        0.22       0.61 0.39 0.92
## V9    0.77                   0.64 0.36 0.94
## V10   0.77        0.26       0.66 0.34 0.89
## V11   0.61                   0.38 0.62 0.97
## V12   0.70                   0.50 0.50 0.98
## V13   0.83                   0.70 0.30 1.00
## V14   0.75                   0.58 0.42 0.97
## V15   0.69                   0.48 0.52 0.99
## V16   0.74                   0.56 0.44 0.99
## V17   0.71                   0.53 0.47 0.97
## V18   0.65                   0.44 0.56 0.97
## V19   0.63                   0.41 0.59 0.98
## V20   0.66                   0.44 0.56 0.98
## V21                     0.34 0.13 0.87 0.02
## V22                     0.27 0.07 0.93 0.00
## V23-                   -0.81 0.66 0.34 0.00
## V24-                   -0.61 0.37 0.63 0.00
## V25                     0.23 0.06 0.94 0.00
## V26-                   -0.44 0.19 0.81 0.00
## 
## With eigenvalues of:
##    g  F1*  F2*  F3* 
## 9.28 0.03 1.55 1.53 
## 
## general/max  5.98   max/min =   57.78
## mean percent general =  0.66    with sd =  0.4 and cv of  0.6 
## Explained Common Variance of the general factor =  0.75 
## 
## The degrees of freedom are 250  and the fit is  1.72 
## The number of observations was  649  with Chi Square =  1095.18  with prob <  4.9e-106
## The root mean square of the residuals is  0.04 
## The df corrected root mean square of the residuals is  0.05
## RMSEA index =  0.073  and the 10 % confidence intervals are  0.068 0.077
## BIC =  -523.68
## 
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 299  and the fit is  3.37 
## The number of observations was  649  with Chi Square =  2150.99  with prob <  4.8e-277
## The root mean square of the residuals is  0.09 
## The df corrected root mean square of the residuals is  0.09 
## 
## RMSEA index =  0.099  and the 10 % confidence intervals are  0.094 0.102
## BIC =  214.83 
## 
## Measures of factor score adequacy             
##                                                  g   F1*  F2*  F3*
## Correlation of scores with factors            0.97  0.08 0.83 0.87
## Multiple R square of scores with factors      0.93  0.01 0.69 0.76
## Minimum correlation of factor score estimates 0.86 -0.99 0.38 0.53
## 
##  Total, General and Subset omega for each subset
##                                                  g  F1*  F2*  F3*
## Omega total for total scores and subscales    0.94 0.74 0.94 0.51
## Omega general for total scores and subscales  0.86 0.74 0.82 0.43
## Omega group for total scores and subscales    0.07 0.00 0.13 0.08

Exercise 10

Reduce the number of dimensions using hierarchical clustering analysis.

####################
#                  #
#    Exercise 10   #
#                  #
####################

iclust(data, title="Clastering solution")

## ICLUST (Item Cluster Analysis)
## Call: iclust(r.mat = data, title = "Clastering solution")
## 
## Purified Alpha:
##  C21  C24 
## 0.95 0.60 
## 
## G6* reliability:
## C21 C24 
##   1   1 
## 
## Original Beta:
##  C21  C24 
## 0.84 0.36 
## 
## Cluster size:
## C21 C24 
##  20   6 
## 
## Item by Cluster Structure matrix:
##       O   P   C21   C24
## V1  C21 C21  0.69 -0.01
## V2  C21 C21  0.69 -0.01
## V3  C21 C21  0.67 -0.13
## V4  C21 C21  0.68  0.02
## V5  C21 C21  0.68 -0.02
## V6  C21 C21  0.75 -0.01
## V7  C21 C21  0.75  0.03
## V8  C21 C21  0.78  0.03
## V9  C21 C21  0.80  0.01
## V10 C21 C21  0.81  0.05
## V11 C21 C21  0.62 -0.03
## V12 C21 C21  0.67  0.08
## V13 C21 C21  0.80  0.01
## V14 C21 C21  0.69 -0.07
## V15 C21 C21  0.65  0.04
## V16 C21 C21  0.72  0.08
## V17 C21 C21  0.72  0.01
## V18 C21 C21  0.65 -0.05
## V19 C21 C21  0.61 -0.05
## V20 C21 C21  0.66 -0.03
## V21 C24 C24  0.03  0.36
## V22 C24 C24  0.02  0.31
## V23 C24 C24 -0.02  0.72
## V24 C24 C24 -0.02  0.57
## V25 C24 C24  0.00  0.27
## V26 C24 C24 -0.03  0.51
## 
## With eigenvalues of:
##  C21  C24 
## 10.0  1.4 
## 
## Purified scale intercorrelations
##  reliabilities on diagonal
##  correlations corrected for attenuation above diagonal: 
##      C21   C24
## C21 0.95 -0.01
## C24 0.00  0.60
## 
## Cluster fit =  0.91   Pattern fit =  0.99  RMSR =  0.05